Slope |
---|
-0.5590 |
Slope |
---|
-0.5761 |
Slope |
---|
-1.0187 |
Slope |
---|
-0.8451 |
Usually Zipf’s law compares word frequency with word rank. Here we consider:
Significance vs. rank (drawn in red): Order all pairs of co-occurrences by significance. Plot the significance against the so defined rank.
Frequency vs. rank (green): As above, but use frequency instead of rank.
Number of right neighbors per word vs. rank (blue): Much like the original Zipf’s law, but use the number of right co-occurrences instead of its frequency.
Number of left neighbors per word vs. rank (pink): As above, but left instead of right neighbors.
The slope of the curves may give language parameters.
Especially the comparison of the last two lines might be important to see a possible difference between left and right neighbors for some languages.
Significance vs. rank:
SELECT @pos:=(@pos+1), round(sig) FROM (SELECT @pos:=0) r, co_n WHERE w1_id>100 and w2_id>100 order by round(sig) desc;
Frequency vs. rank:
SELECT @pos:=(@pos+1), freq FROM (SELECT @pos:=0) r, co_n WHERE w1_id>100 and w2_id>100 order by freq desc;
Number of right neighbors per word vs. rank:
SELECT @pos:=(@pos+1), xx.cnt FROM (SELECT @pos:=0) r, (select count(*) as cnt from co_n WHERE w1_id>100 and w2_id>100 group by w1_id order by cnt desc) xx;
Number of right neighbors per word vs. rank:
SELECT @pos:=(@pos+1), xx.cnt FROM (SELECT @pos:=0) r, (select count(*) as cnt from co_n WHERE w1_id>100 and w2_id>100 group by w2_id order by cnt desc) xx;
How to calculate the slope?
Why the green and red curve are parallel? Cp. 5.2.9.
5.2.9 Zipf's law for Sentence co-occurrences